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Abstract 



Understanding models which represent the invasion of network-based systems by 
infectious agents can give important insights into many 'real-world' situations, 
, including the prevention and control of infectious diseases and computer viruses. 

' Here we consider Markovian susccptiblc-infectious-susceptiblc (SIS) dynamics 

q . on finite strongly connected networks, applicable to several sexually transmitted 

diseases and computer viruses. By representing the model as a percolation 
process we provide a theoretical definition of invasion probability. We then 
O" 1 show that, for undirected networks, the probability of invasion from any given 

individual is equal to the (probabilistic) endemic prevalence, following successful 
invasion, at the individual (we also provide a relationship for the directed case); 
, the total (proportional) endemic prevalence in the population is thus equal to 

I/"*) ' the average invasion probability (across all individuals). Consequently, for such 

systems, the regions or individuals already supporting a high level of infection 
arc likely to be the source of a successful invasion by another infectious agent. 
This could be used to inform targeted interventions when there is a threat from 
' an emerging infection. 

m i 

Introduction 

'Compartmcntal' models pQ-@], m which interacting individuals exist in discrete 
; ^ \ states, for example: susceptible, infectious or recovered, capture the essence of 

■ 'real-world' host-to-host infection dynamics. Transition between states is often 

represented as a time-homogeneous Poisson process [2J, [3], which can depend 
on the states of other individuals. In the context of infinite fully-connected 
populations, where every individual interacts equally with every other individ- 
ual, many important results have been derived. For example, the Markovian 
susceptible- infectious-recovered (SIR) model exhibits an invasion threshold such 
that, depending on the combined effect of the rate at which individuals make 
infection-causing contacts and the rate at which infected individuals recover, 
either a non-zero fraction of the population is infected, which one can calcu- 
late and is often referred to as the 'final size', or the infection rapidly dies out 
(assuming a small number of initially infected individuals) [TJ. Likewise, the 
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Markovian susceptible-infectious-susceptible (SIS) model also exhibits an inva- 
sion threshold such that, depending on the same factors, the infection either 
persists at some constant endemic level or rapidly dies out [5]- The way in 
which these thresholds, and the final size and endemic prevalence, are affected 
when an immunisation process is included has been fruitfully investigated. By 
comparing these and similar models with real statistical data it has also been 
possible to quantify the invasion risk, and subsequent impact on the population 
if invasion does occur, for various diseases [B], [7]. Therefore, we have some way 
of determining optimum vaccination policies for the eradication or control of 
specific diseases. 

In reality, the probability of invasion from a single infectious individual 
clearly depends on that individual's particular relationships with others, i.e. 
some individuals are better connected than others. In order to capture such 
heterogeneity and thus make the models more realistic, the population can be 
represented as a contact network [5] which defines, for each individual, the sub- 
set of the population with which it has direct contact. In this context, if the 
population is assumed to be infinite such that the number of neighbours with 
which an individual has contact is described by some probability distribution, it 
is sometimes possible to compute the invasion probability and final size for the 
SIR model by utilising percolation theory [S]-[TD] . However, finite contact net- 
works pose conceptual difficulties; indeed, the theoretical definition of invasion 
depends on a framework which posits an infinite population (such that inva- 
sion corresponds to indefinite persistence) . Here, following the work of Harris 
[TT] , [12] , we will represent Markovian SIS dynamics on finite strongly connected 
networks as a percolation process (or random graph) and thereby provide a 
rigorously justified definition of invasion probability. We will also show that in 
this context there exists an exact relationship between individual-level invasion 
probability and endemic prevalence (the probability that a given individual is 
infected at any given time once the infection has 'become' endemic |13j). 

We will illustrate our results on a finite and extremely heterogeneous (strongly 
connected) contact network (network data SI, see supporting text SI §2 for de- 
tails). 

Markovian SIS Dynamics on a Contact Network 

In Markovian SIS dynamics, an individual is able to flip repeatedly between 
two states: susceptible and infectious. This happens via a locally influenced 
time-homogeneous Poisson infection process and an individual-specific time- 
homogeneous Poisson recovery process (on recovery an individual returns to 
the susceptible state). In the context of individuals interacting in this way on 
a regular square lattice the SIS model is also known as the contact process 
[TT] (see Liggett [14] for theoretical results, and Durrett and Levin [15] for an 
ecological perspective). Here we will examine the dynamics of the Markovian SIS 
model on the full set of networks which are finite, static and strongly connected. 
We will denote a generic weighted network by T where Tij indicates the rate 
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parameter of the Poisson process in which an infectious individual j infects a 
susceptible neighbour i £ {1,2,..., N} where N is the population size). We 
will also refer to a vector T = (71 , 72, . . . , 77V ) where ji is the rate of recovery for 
individual i when it is infectious. The weighted network T combines the rates at 
which the individuals interact with the probability that infection occurs when an 
infectious individual contacts a susceptible individual. In this way, T captures 
features of the network and the infectious agent (for more detail on network- 
based Markovian SIS dynamics see supporting text SI §1). This framework, 
which represents the dynamics of several sexually transmitted diseases [Td]-[T^] 
and computer viruses [20j-|23j. exhibits the phenomena of both invasion and 
endemic prevalence (See Durrett [21] for a discussion of methods and results 
relating to SIS dynamics on random scale- free networks). 

Definitions and Measurement 

Endemic Prevalence 

An immediate problem with finite systems is that there are no genuine sta- 
tionary distributions corresponding to endemic infection because the long-term 
behaviour is always guaranteed extinction (of the infectious agent). However, 
from a practical point of view, it is possible to obtain something like the endemic 
stationary distribution; figure la illustrates this clearly for our example network 
T ex (supporting text SI §2). While a genuine stationary distribution does not 
exist, it is clear that we can still measure something like endemic prevalence 
through stochastic simulation since ultimate extinction is, in most cases, very 
unlikely on short time scales. 

From a theoretical perspective, the situation can be made precise. Our 
system is finite and Markovian with a single absorbing state (extinction of the 
infection). Also, since we arc considering strongly connected networks, the 
transient states of our system form a single commuting class. In this case, if 
we initiate the system in a transient state and condition on the survival of the 
infection, the system tends to a unique distribution referred to as the quasi- 
stationary distribution (QSD) |25j-|27j (see appendix). Let us denote by A an 
arbitrary subset of the population represented by a strongly connected network 
T, and let T be the vector of individual-specific recovery parameters. We can 
now unambiguously define P^r (quasi-prevalcnce) to be the probability that at 
least one individual in subset A of the network is infectious in the QSD. The 
practical determination of this probability is generally clear but can gain errors 
in measurement where the network is very small or the overall prevalence is 
low or where it is difficult to determine when the infection has 'become' quasi- 
stationary (since in practice we cannot wait an infinite amount of time - see 
supporting text SI §3). 
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Figure 1: (a) A typical simulated infectious time line for Markovian SIS dy- 
namics on our example network T ex (network data SI, supporting text SI §2) 
where the outbreak was initiated on a single infectious individual. The weighted 
network matrix was multiplied by 0.01 and the recovery rate was set to unity 
for all individuals, (b) The number of infection events in 100 simulations of an 
outbreak, which were allowed to run up to a maximum of 300 infection events, 
initiated on the same individual each time. Again, the weighted network matrix 
was multiplied by 0.01 and the recovery rate was set to unity for all individuals. 
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Invasion Probability 

Although invasion probability [3S], is usually simple to measure (support- 
ing text SI §4), in the context of finite contact networks it has not been given 
a rigorous theoretical definition (see Nasell [3U] for a discussion of the thresh- 
old phenomenon in the stochastic SIS model). For dynamics with a genuine 
stationary endemic distribution, which (in this context) can only exist when 
the population size is infinite, the probability of invasion can be defined as the 
probability that the infection will persist indefinitely. For finite populations, 
this is complicated because we have to distinguish between infections that fail 
to invade and infections which successfully invade but then subsequently die 
out. By representing the dynamics as a percolation process (supporting text SI 
§5), we can rigorously justify that the quantifier of invasion probability, which 
is as equally meaningful and relevant as P^ r (quasi-prevalence), for outbreaks 
initiated on the members of a subset A of a network T (with associated vector 
T) can be defined: 

A , ■ • ■ s P£(t)(T,T) 

Prp r (quasi-invasion) = lim — rn 

P^ l \t)(T,T) 

where P£(t)(T, T) and P^ ll (t)(T, T) are the probability of survival to time t for 
outbreaks initiated at t = on the members of subset A and the full network 
(everything infected) respectively. Our definition of invasion can be seen to 
correspond to the intuitive notion of invasion i.e. the 'attainment' of the QSD 
(see appendix). 

If we generalise the quantity appearing in the denominator of the above def- 
inition of invasion probability, such that it is replaced by the probability that 
the infection survives to time t given that the initial state is that which max- 
imises the expected time to extinction, then the definition becomes applicable 
to any Markovian infection dynamics (on strongly connected networks) which 
permit relatively stable endemic behaviour, e.g. susceptible-infected-removed- 
susceptiblc (SIRS) dynamics. This is because, in such cases, there will be a 
unique and relatively 'stable' QSD which enables our definition to capture the 
probability of invasion in the same way as for SIS dynamics. 

In measuring the probability of invasion through stochastic simulation, the 
key requirement is separating major outbreaks from minor outbreaks. From a 
practical perspective, unless the network is very small or the threshold criteria 
for invasion is only just met, the behaviour of the system always splits clearly 
into small outbreaks or large outbreaks given the same initial configuration. 

Figure lb illustrates the typical output for outbreaks initiated by the same 
individual in the SIS framework. In general, the practical issues which emerge in 
measuring invasion probability by stochastic simulation for the SIS framework 
are minor (supporting text SI §4). 
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Results 



Prevalence-Invasion Relationship 

Our main result can be stated as the following (prevalence-invasion) relationship: 

Pt r (quasi-invasion) = P^ T r (quasi-prevalence) 

for any subset A of a weighted and strongly connected contact network T, con- 
ditional on Markovian SIS dynamics. Here T T denotes the transpose (reversed 
links) of T. 

The relationship implies the following important corollary where the subset 
A constitutes a single individual i and T is undirected: 

p (quasi-invasion) = P^ r (quasi-prevalence) 

that is, the probability of invasion from a given individual i is equal to the 
probability that it is infected in the QSD. After summing over all i on both 
sides, we also observe that the global prevalence in the QSD is equal to the 
population size multiplied by the average invasion probability (the probability 
of invasion given that the infection is seeded on a single individual selected 
uniformly at random). 

The proof of this relationship follows easily from a property known as 'dual- 
ity' which is normally expressed in terms of the contact process on undirected 
non- weighted networks [31], [32]. Duality follows from the graphical represen- 
tation of the contact process by Harris [12] and is also applicable to the general 
directed network case we consider here (supporting text SI §6). By combining 
duality with the definitions of quasi-invasion and quasi-prevalence above, the 
relationship can be proved (see appendix). 

Example 

Here, we illustrate our theoretical results on the largest strongly connected 
(5,119 node) component T ex (network data SI) of a particular heterogeneous 
transmission network. This network comprises 11,480 nodes and was generated 
from simulations on a complex model of the spread of H5N1 avian influenza 
through the British poultry flock [33], [3J]. It exhibits extensive heterogene- 
ity including complex spatial structures, heterogeneous transmission strengths 
varying over many orders of magnitude, clustering and directed links. It there- 
fore serves as a somewhat rare example of 'realistic' epidemic contact structures 
(supporting text SI §2). 

By varying a scalar multiplier of a network matrix T we can attempt to 
investigate infections of varying transmissibility spreading on the same network 
of individuals. Figure 2 illustrates our results for a single node in our example 
network T ex , clearly showing the relationship between invasion probability and 
endemic prevalence. 
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Figure 2: (a) Measurements of P^ T r (quasi- invasion) and 

r (quasi-prevalence) for a single individual (i = node 2332) in our ex- 
ample network T ex . The recovery rate was set to unity for all individuals while 
the multiplier of the network matrix was varied (supporting text SI §3 and 
4). The faint dashed line indicates equality. On this scale it is not possible 
to determine any deviation from the equality of the two quantities, (b) A 
'zoomed-in' view of the perpendicular deviation of each of the data points from 
the straight line (equality), in the bottom right to top left direction. 
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Discussion 



By considering the unique QSD associated with Markovian SIS dynamics on 
strongly connected networks, along with its implications under duality, we have 
provided a meaningful definition of invasion probability in this finite context. 
Utilising this definition, we have also provided a general statement of the exact 
relationship between invasion probability and endemic prevalence at the indi- 
vidual and population level, for any finite undirected network of arbitrary het- 
erogeneity (including undirected networks with weighted links and individual- 
specific recovery parameters). The relationship also has implications in the 
context of directed networks. 

We note that for two specific homogeneous networks (infinite square lattice 
and infinite 'great circle'), invasion probability can be shown to be equal to the 
fraction of the population infected in the (stationary) endemic situation [35], 
|36j . Furthermore, the relationship between the probability of long-term persis- 
tence and quasi-stationary distributions has previously been investigated (see 
Chaterjee and Durrett [37] and, for the related concept of 'metastability', sec 
Schonmann [35] and Simonis [31]). However, although the prevalence- invasion 
relationship follows easily from a combination of the QSD and duality, to our 
knowledge this is the first general statement of this exact relationship for any 
finite strongly connected network. We have thus related two fundamental epi- 
demiological quantifiers in systems where they cannot usually be calculated 
analytically due to complexity. 

It is generally easier to collect empirical data on endemic prevalence rather 
than directly on invasion risk. In the case of undirected networks prevalence 
data can thus be utilised to inform invasion risk. This method echos Anderson 
and May's [7] estimation of the 'basic reproductive ratio' of measles from the 
total number of susceptible individuals in England and Wales (using data from 
Fine and Clarkson [30] ). 

When other infectious agents exhibit qualitatively similar behaviour on the 
same undirected network, we can expect that the individuals carrying the great- 
est level of endemic infection are also those most likely to initiate new successful 
invasions. This supports the targeting of high-risk individuals, in these systems, 
as an effective strategy for the mitigation and control of emerging epidemics. 

Appendix 

The Practical Relevance of P^ r (quasi-prevalence) 

Daroch and Seneta [25], [26] have shown that, for a continuous-time Markov 
chain with absorbing states, and with a finite set of transient states that form a 
single commuting class, there is a unique distribution towards which the chain 
will converge when conditioning on non-absorption (and this convergence will 
occur, given non-absorption, when any of the transient states is the initial state). 
This distribution is known as the 'quasi-stationary' distribution (QSD). Thus, 



for a strongly connected network in which infection can be transmitted, via 
some route, from any individual to any other individual, a unique QSD for the 
Markovian SIS model exists. 

It can be argued that the quasi-stationary distribution has practical rele- 
vance (i.e. is a good 'representation' of the endemic situation [57]) if the rate of 
convergence to this distribution, when conditioning on non-absorption, is rapid 
compared to the rate at which the system decays to inevitable absorption when 
it is 'initiated' in the QSD [55]. This is usually the case for SIS dynamics for 
which, according to Nasell [57J, 'it is easy to find examples where the expeced 
time to extinction even for a rather small population exceeds the age of the 
universe'. More specifically, Simonis [33] has shown that the contact process 
on large but finite multi-dimensional homogeneous square lattices, where the 
initial state is all-infected, will be near to the stationary distribution of the cor- 
responding infinite process, restricted to the finite set, for most of its lifetime 
(assuming the corresponding infinite process is supercritical). 

Adopting the graphical notation of Harris [TT] (supporting text SI §5.3), let 
us consider the following quantities: 

1. Pt,t{& H A ^ 0) = The probability that at least one member of subset 
A is infected at time t given that all individuals are infected at t = 0. 

2. Py,r(£t fl V 7^ 0) = The probability that the infection survives to time t 
given that all individuals are infected at t = 0. 

3. P T ,r(£t n A + ®)/ p T,r(€t H V ^ 0) = The probability that at least one 
member of subset A is infected at time t given that all individuals are 
infected at t = and given that the infection survives to time t. 

Here, V is the set of all individuals, such that A C V, and £^ is the set of 
infected individuals at time t when only (all) the members of A are infected 
at t = 0. Pt,t denotes the probability measure for the case where the model 
is parametrised by T and T. Note that, in the limit as t — > oo, quantity 3 is 
equal to P^ T (quasi-prevalence) (the probability that at least one member of A 
is infected in the QSD). 

In figure 3a, the way in which these three quantities may vary with respect 
to time is illustrated for a scenario in which the QSD has practical relevance. 
In such a scenario, the quantifier P^r (quasi-prevalence) is able to capture the 
value at which Pr,r((,¥ H A 7^ 0) initially 'plateaus' before its slow decay to zero. 

We have defined P^ r (quasi-prevalence) in terms of the process which oc- 
curs when all individuals are initially infected and then considered the relevant 
quantities 1 to 3 (V is the superscript in all three quantities). We could, how- 
ever, define P^ r (quasi-prevalence) in terms of the process which occurs when 
only the members of B C V are initially infected since the QSD, as defined by 
Daroch and Seneta [55], [55], is independent of initial conditions. 

Nonetheless, the process which occurs when all individuals are initially in- 
fected is unique in that it has the maximum expected time to extinction across 
all possible initial states (this follows from £f C £Y - see, for example, Grimmett 
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[35]). Therefore, it is this non-conditioned process which is most appropriately 
described by the QSD. 

The Practical Relevance of r (quasi-invasion) and its Jus- 
tification as a Quantifier of Invasion Probability 

The property of 'duality' pertaining to Markovian SIS dynamics (and the contact 
process) can be expressed as follows: 

n B + 0) = P T T, T (tf n A 0) 

where A, B C V (V is the set of all individuals) , T is an arbitrary weighted 
contact network and T T is the transpose of T (the directions of the Poisson 
infection processes are reversed). Note that for undirected networks T = T T . 

Duality for the contact process was proved independently by Holley & Liggett 
[3"T] and Harris [35] , but the relationship is usually stated in terms of the process 
on undirected non- weighted networks. However, the same reasoning implies the 
relationship stated above (supporting text SI §6). In the case of undirected net- 
works the contact process is said to be 'self-dual'. In words the theorem says: 
The probability that at least one member of A is infected at time t, given that 
only (all) the members of B are infected at t — 0, is equal to the probability 
that at least one member of B is infected at time t, given that only (all) the 
members of A are infected at t = and given that the transmission processes 
are reversed (i.e. in T T ). 

Let us consider the following quantities: 

4. Pr,r(£f H V 7^ 0) = The probability that the infection survives to time t 
given that only the members of A are infected at t = 0. 

5. Pt,t{^Y H 7 ^ 8) = The probability that the infection survives to time t 
given that all individuals are infected at t = 0. 

6. P T .v{it n V ^ 0)/P T ,rfe y n V + ) = Thc q uoticnt of quantities 4 and 5. 

It follows from duality that the three quantities, 4, 5 and 6 above, are all equal 
respectively to the three quantities, 1, 2 and 3 of the previous section, provided 
we assume T in the above quantities to be the transpose of the network consid- 
ered in the previous section. Note also that, in the limit as t — > oo, quantity 6 
is equal to P^ r (quasi- invasion). 

Quantity 4 denotes the survival probability up to time t. We see in figure 
3b that this plateaus in exactly the same way as quantity 1 for the transposed 
network (figure 3a). This plateau, which is captured by quantity 6 in the limit 
as t — > oo i.e. (quasi- invasion), corresponds to thc 'achievement' of the 

QSD and thus with successful invasion. 

Note that the definition of P^r (quasi- invasion) implies that invasion is cer- 
tain when all individuals arc initially infected. In a specific scenario, the extent 
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Figure 3: Here we illustrate how the quantifiers r (quasi-prevalence) and 
-Pt r (quasi- invasion) are able to capture critical information about the model 
(for the case where the QSD has practical relevance). For illustrative purposes, 
we have significantly exaggerated the rate at which quantities 1, 2, 4 and 5 
(eventually) tend towards zero (i.e. the rate at which the system tends towards 
absorption when 'initiated' in the QSD). 



to which this implication is valid is the extent to which the definition is mean- 
ingful for any A (this is equivalent to the conditions under which the QSD has 
practical relevance). 

Since quantities 6 and 3 are equal (if one relates to T, the other to T T ), we 
have the prevalence-invasion relationship. 
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Supporting Text: 



1 Markovian SIS Dynamics on a Contact Net- 
work 

In Markovian SIS dynamics [1] on contact networks individuals can be in one of 
two states: susceptible or infectious. Susceptible individuals are at immediate 
risk of infection when one or more of a defined set of neighbours are infectious 
(we assume that infection is only possible between specified pairs of individuals 
and, even then, may only be possible in one direction). All infectious individu- 
als eventually recover and immediately become re-susceptible to the infection. 
The infection and recovery processes are modelled as time-homogeneous Poisson 
processes, and so the system is a continuous-time Markov chain. 

In epidemiology, a contact network [2] can be represented as an adjacency 
matrix G, such that Gij = 1 if it is possible for infection to pass directly 
from individual j to individual i, and is zero otherwise. The non-zero elements 
of G can be weighted in order to capture heterogeneity in the likelihood of 
transmission between pairs. In this case, we will refer to a weighted contact 
network T, where Tij is the rate parameter of the Poisson process whereby j 
infects i (when i is susceptible and j is infectious). The order of T will thus be 
N x N where N is the total population size. We will also refer to a vector of 
recovery parameters T = (71, 72, ■ • ■ , 7at), such that 7$ is the parameter for the 
exponentially distributed infectious period (or time until recovery) of individual 

1 when it is infectious. 

Let Aj be the infectious pressure on individual i, such that the time un- 
til infection, for i, is exponentially distributed with parameter A^. Let \ii be 
the parameter for the exponentially distributed infectious period (or time until 
recovery) for individual i. We can now define the stochastic model with the 
following equations: 

TV 

Aj = ^ I j Si 
0=1 
Hi = jili 

where Ii = 1 if % is infectious and Si—1 if i is susceptible, and both are zero 
otherwise. 

Given an initial configuration, such that the state of each individual is known, 
and assuming T and T are also known, direct simulation can be employed to 
produce statistically accurate realisations of the stochastic model. 

2 Our Example Network 

The network which we use to illustrate our theoretical results is based on the 
simulations carried out by Jonkers et al. [3] to investigate different epidemic 
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control strategies and virulence for H5N1 avian influenza in the British poultry 
flock. These simulations were performed using a model of the British poultry 
industry detailed in Sharkey et al [1] . In total, over 40 million simulations were 
made in this study, each starting with an individual selected uniformly at ran- 
dom. For each of these simulations, every transmission event was logged by 
recording the transmitter node and the receiver node. The network was then 
encoded in a matrix totalling the number of transmission events between the 
nodes in all of the simulations. This comprises over 9 million transmission events 
in total (many simulations failed to generate any transmission events). While 
this is not intended to represent avian influenza with a particular control strat- 
egy or a particular virulence, it qualitatively represents the heterogeneities and 
transmission risk profiles of the industry. Given the detail and complexity of the 
original model, this represents a transmission network with 'realistic' properties. 
Tex is a matrix representing the largest strongly connected component of the 
full network described above and is provided in network data SI (on each line of 
the document there is a list of three numbers which together describe a possible 
transmission route. The first number is the label for individual j, the second is 
the label for individual i and the third gives X^-. See §1 for the definition of a 
network matrix T). 



The probability Pj, r (quasi-prevalence) that individual i is infectious in the 
QSD, i.e. in the endemic situation, is equal to the proportion of time for which 
i is infected after the system has 'reached' this distribution, assuming the infec- 
tion does not die out (the subscripts T and Y merely parameterise the model - 
see §1). Therefore, we measure this as 



where n is a large number of simulated consecutive global events which occur 
when the probabilities for the system states obey the QSD. Atk is the simulated 
time between the (k — l) th event and the k th event, and if = 1 if i is in the 
infectious state between these events but is zero otherwise. The total simulated 
time is denoted by r = Y^k=i Atfc. Obviously, the larger we can make n, the 
more exact our measurement becomes. Notice that the events corresponding to 
a change in the state of individual i, and thus to a change in the value of if, 
represent a very small fraction of the total global simulated events (assuming 
the population size is not extremely small). 

In practice, we run the simulation for a sufficiently long time such that 
the system is likely to be described by the QSD before we start to compute 
P?p r (quasi-prevalence) . For example, were our simulation to produce an infec- 
tious timeline like the one in figure la we would only include the data from 
after, say, t = 30 in our computation. 



3 Measuring P^ T (quasi-prevalence) 
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Note that in every simulation there is always the possibility of extinction, 
even after the QSD has been 'reached'. However, we can safely discard such sim- 
ulations since the quantity we are interested in is conditioned on non-extinction. 

In producing figure 2, 100 simulations were run for each of 20 different mul- 
tipliers of T ex , an d in each simulation the first 10 million events, out of 11 
million, were discarded. The remaining data was used for our measurement 
of . r (quasi-prevalencc) (i = 2332). Each simulation was initiated with 
sufficient infected individuals such that the probability of early extinction was 
negligible. As extinction never occurred, none of the simulations were discarded. 
This reinforces the practical relevance of the QSD for SIS dynamics. 

4 Measuring P^ T (quasi- invasion) 

In mean-field theory, when considering infinite fully connected populations, the 
probability of invasion from a single infected individual in an otherwise suscep- 
tible population corresponds to the probability that, given such an initial state, 
the infection persists indefinitely. However, in our network context, we have 
a finite state space with a single absorbing state (extinction) that is accessible 
from all of the transient states. This means that it is impossible for the infection 
to persist indefinitely. Therefore, we instead look for dichotomised behaviour in 
relation to the time until extinction by carrying out large numbers of simula- 
tions. We measure P^ r (quasi-invasion) as the fraction of simulations in which 
an outbreak initiated by individual i persists for some sufficiently large num- 
ber Ej of infection events, such that a clear distinction can be made between 
the simulations in which long-term persistence occurs, and the ones in which it 
does not (i.e. we associate long-term persistence with the 'attainment' of the 
QSD and hence with quasi-invasion). Without observation of this dichotomised 
behaviour we cannot justify an identification of long-term persistence in any 
given simulation. For example, the histogram in figure lb enables us to identify 
the simulations which exhibit long term persistence, even though we have only 
tracked each simulation to the 300th infection event i.e. Ej = 300. If such 
a histogram cannot be produced from the simulations, Ei can be increased in 
order to try and 'uncover' the dichotomised behaviour. If this still fails then we 
cannot justify this method of measurement. 

Typically, P l s (t)(T 1 T)/P^ ll (t)(T,T) converges so rapidly that it closely ap- 
proximates P^ r (quasi-invasion) for relatively small t, where the contribution of 
the denominator is negligible. We look for the dichotomised behaviour as an 
indication that we are considering a sufficiently long period of time such that 
this close convergence has already occurred. 

In producing figure 2, 1 million simulations were run for each of the 20 
multipliers of T^ x which were investigated. For each simulation, the infection 
was initiated on individual i = 2332 and the process was tracked for a maximum 
of 500 infection events. The resulting data was used for the measurement of 
P* T (quasi-invasion). 



17 



5 Oriented Percolation for Markovian SIS Dy- 
namics on Contact Networks 



5.1 The Representation Graph S 

Graphical representation and percolation techniques for the study of the contact 
process were first introduced by Harris [5], [S]. These techniques were then 
utilised and developed by probabilists such as Grimmett [7J, Liggett [5] and 
Durrett [Hj • To explain these ideas we first construct a graph S such that when 
its edges are removed according to specific probabilities we capture the statistics 
of Markovian SIS outcomes for a given T and F (and any initial configuration). 
The graph S is constructed as follows (an example is given in figure SI): 

Assume that we are interested in a time period r and that we are considering 
a population of N individuals in a weighted contact network T. Now, assuming 
that we also wish each node in S to represent an individual at a particular point 
in time, and that we are considering regular time intervals of size h, then S will 
need to consist of N(x + 1) nodes (assuming that v eN). We now label the 
nodes corresponding to the N individuals at t — as 1^ to N^-°\ and the nodes 
corresponding to the individuals at t — h as 1W to N^ h \ and so on, such that 
node i( ah } corresponds to individual i at time ah (a S {0, 1,2,..., r/h}). 

We then place a directed edge from node v& to node i^ t+h ^ for every i g 
{1, 2,3,..., N} and every t € {0, h, 2h, . . . , r — h}. These edges are labelled 
'Recovery' edges and represent possible infection routes in the sense that an 
infected individual at time t can 'transmit' the infection to time t + hhy simply 
not recovering in the intervening period, and nodes and represent 
the same individual at times t and t + h respectively. 'Recovery^ edges thus 
correspond to the capacity of individual i, being infectious at time t, to still be 
infectious at time t + h. 

For every possible ordered pairing of i and j, where i ^ j and i,j € 
{1,2,3, ... ,N}, we also place a directed edge from node to node 
for every t £ {0, h, 2h, ...,T — h}. These edges are labelled 'Infection' edges. 
'Infection'^ edges represent the possible infection routes corresponding to the 
capacity of individual j, being infectious at time t, to 'cause' individual i to be 
infectious at time t + h (if i is otherwise susceptible) . 

5.2 Post-Removal Networks 

The probability that an infectious individual i will recover between time t 
and time t + h is J 7ie~ 7i *d£, and this is true independently of the time at 
which the individual was originally infected. This is due to the fact that the 
recovery process is Poisson. Likewise, the probability P^ that individual j, 
being infectious at time t, will cause individual i to be infected at time t + h, if 
i is otherwise susceptible, is given by J Tj.,e _Tw *di. 

We note that as h — > (but remains positive) we have Pf 1 — » 7,/i and 
PL — > Tijh. We also note that the probability of more than one event happening 
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in a time interval of size h approaches zero. 

Now, having made h infinitesimal, we remove every edge of S according to 
specific probabilities. Each ' Recovery \ edge is removed with probability P^ = 
7i/i, and each 'Infection'^- edge is removed with probability 1 — PL = 1 — T^h. 

The resulting post-removal network captures the statistics of SIS outcomes 
on our contact network. For instance, the probability that a path exists from a 
particular node to any node in <E {1,2,3,..., N}) in the post-removal 
network is equal to the probability that an infection starting with individual j 
at t = will survive until t = r. 

5.3 Useful Notation 

Assume that we are interested in a graph/network G where V is the set of 
all nodes/individuals. Now, adopting the notation of Harris [5], let £f be 
the stochastic state of the system for the Markovian SIS model (or contact 
process) on the graph/network G at time t, given that A C V is the set of 
nodes/individuals that are initially infected at t = (all other individuals are 
initially susceptible). Therefore, £f is a random 0/1 vector of N elements such 
that the ith element £,f(i) = 1 if individual i is infected at time t, whereas 
= if individual i is susceptible at time t (assuming only the members 
of A are infected at t = 0). Note that there is a one-to-one correspondence be- 
tween the set of possible stochastic system states and the set of possible sets of 
infectious individuals (since if we know which individuals are infectious, we also 
know which are susceptible, and hence the state of the system) and so, follow- 
ing Grimmett [7J, we will use sets and vectors interchangeably in this respect. 
Therefore, the event that i is infectious at time t given that only individual 
j is infected at t = 0, this being equivalent to the existence of a contiguous 
path from to in the corresponding post-removal network (see previous 
section), is denoted by n {i} ^ 0. If the realisation of the post-removal 
network is 'modelling' Markovian SIS dynamics for a weighted contact network 
T (G is defined by T) , in which each individual has also been assigned a specific 
recovery parameter, via the vector T, we will use Pr,r to denote the appropriate 
probability measure. Thus, the probability that individual i will be infected at 
time t, given that individual j is the single initial infected at t = 0, can then be 
expressed as P Ti r(£t j} n {i} ^ 0). 

6 The Duality Property for General Networks 

The property of duality can be expressed: 

PTTiHf n B jk 0) = P T T iF (tf n A jk 0) 

where A, B C V (V is the set of all individuals), T is an arbitrary weighted 
contact network and T T is the transpose of T (the directions of the Poisson 
infection processes are reversed). Note that for undirected networks T = T T . 
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S1 S2 




Figure SI: Here, the representation graph 5 has been constructed twice (51, 52) 
for a network of three individuals between t = and t = Ah. In the graph on 
the left (51), all paths from are shown. On the right (52), all paths to 
are shown. A 'corresponding' pair of paths are shown in bold. 

Duality for the contact process was proved independently by Holley & Liggett 
[TU] and Harris [TT| , but the relationship is usually stated in terms of the process 
on undirected non-weighted networks. 

By inspection of the representation graph 5, and by considering possible 
post-removal networks, it can be shown that the duality property holds in the 
form presented above for Markovian SIS dynamics on finite networks of any 
structure, where we can also allow ordered-pair-specific transmission parameters 
and individual-specific recovery parameters. 

In Figure SI, 5 has been constructed twice (51, 52) for a network of three 
individuals, i, j and k, between t = and t = Ah (N = 3, V = k}). The 
solid lines in the graph on the left show the possible paths (in 51) from the 
specific node to any node rS ih ^> (n £ V) and the bold line represents a 
particular one of these which may exist in the post-removal network. The graph 
on the right does the same for paths (in 52) from any node rv-°> (n € V) to 
the specific node p Ah > . The bold paths illustrate the fact that for every path 
from to any node nS Ah > in 51 (n £ V) there is a 'corresponding' path in 52, 
which consists of the same number of 'Recovery'^ edges (Vn £ V) and the same 
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number of 'Infection'„ in2 edges as the former has 'Infection'„ 2 „ 1 edges (for each 
ordered pairing of n\ and n%, n\ ^ n% (n^rii £ V)). This 'correspondence' is 
due to the symmetry of 5 and the one set of possible paths being a reflection of 
the other. 

Since PL = Tijh (for h — > 0) can be asymmetric, the two paths of each 
'corresponding' pair are present in the post-removal networks with the same 
probability, if the removal probabilities for 51 are computed using T while the 
removal probabilities for 52 are computed using T T (or vice versa) . This means 
that, due to the symmetry in structure between 51 and 52, the probability 
that at least one of the possible paths depicted in 51 exists in the post-removal 
network is the same as the corresponding probability for 52. Moreover, the 
computation for determining the probability that there will be at least one path 
in 51 going forwards from to any node n^' (n 6 V, t £ {h,2h,3h,4h}) is 
exactly the same as the computation for determining the probability of there 
being at least one path in 52 going backwards from j^ ih ^ to any node n^ h ~ 1 ^ 
(if T and T T are used respectively to compute the removal probabilities for 51 
and 52). Thus: 

Pt,t{& } n V + 0) = p T tA£ n {j} + 0) 

The more general form of the duality theorem stated above can now be obtained 
by considering A, B C V. Using the same arguments, the probability that there 
will be at least one path in 51 going forwards from any node (a <G A) to 
any node (b £ B), in the post removal network, is equal to the probability 
that there is at least one path in 52 going from any node b^ to any node 
(if T and T T are used respectively to compute the removal probabilities for 51 
and 52). 
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